A Fault-Tolerant Exascale Parallel Runtime
نویسندگان
چکیده
منابع مشابه
The Impact of a Fault Tolerant MPI on Scalable Systems Services and Applications
Exascale targeted scientific applications must be prepared for a highly concurrent computing environment where failure will be a regular event during execution. Natural and algorithm-based fault tolerance (ABFT) techniques can often manage failures more efficiently than traditional checkpoint/restart techniques alone. Central to many petascale applications is an MPI standard that lacks support ...
متن کاملFault Tolerance Lessons Applied to Parallel Computing
This paper describes an approach to fault-tolerant parallel computing which is based on the experiences with the most successful fault-tolerant software – the transaction processing systems. The algorithms presented here have less runtime overhead and faster recovery than most preceding approaches. In the Pact parallel programming environment fault tolerance is provided fully user transparent i...
متن کاملMultiscale computing in the exascale era
We expect that multiscale simulations will be one of the main high performance computing workloads in the exascale era. We propose multiscale computing patterns as a generic vehicle to realise load balanced, fault tolerant and energy aware high performance multiscale computing. Multiscale computing patterns should lead to a separation of concerns, whereby application developers can compose mult...
متن کاملFault-Tolerant Parallel Programming with Atomic Actions
The Pact (parallel actions) parallel programming environment provides an easy-to-use parallel execution and synchronization model based on task parallelization. To give the programmer an abstraction for global data (even on distributed memory machines) the Pact runtime system uses virtual shared memory. Execution’s efficiency is improved with data-dependent dynamic load balancing and latency-ma...
متن کاملConcurrent C: real-time programming and fault tolerance
Concurrent C is an upward-compatible parallel extension of C which runs on a variety of uniprocessors and multiprocessors. A Concurrent C program consists of a set of processes which execute in parallel and interact with each other by sending messages. Fault-Tolerant (FT) Concurrent C, an extension of Concurrent C, is a tool for writing fault-tolerant distributed programs, based on the replicat...
متن کامل